Search CORE

90 research outputs found

DBMSs Should Talk Back Too

Author: Ioannidis Yannis
Simitsis Alkis
Publication venue
Publication date: 01/01/2009
Field of study

Natural language user interfaces to database systems have been studied for several decades now. They have mainly focused on parsing and interpreting natural language queries to generate them in a formal database language. We envision the reverse functionality, where the system would be able to take the internal result of that translation, say in SQL form, translate it back into natural language, and show it to the initiator of the query for verification. Likewise, information extraction has received considerable attention in the past ten years or so, identifying structured information in free text so that it may then be stored appropriately and queried. Validation of the records stored with a backward translation into text would again be very powerful. Verification and validation of query and data input of a database system correspond to just one example of the many important applications that would benefit greatly from having mature techniques for translating such database constructs into free-flowing text. The problem appears to be deceivingly simple, as there are no ambiguities or other complications in interpreting internal database elements, so initially a straightforward translation appears adequate. Reality teaches us quite the opposite, however, as the resulting text should be expressive, i.e., accurate in capturing the underlying queries or data, and effective, i.e., allowing fast and unique interpretation of them. Achieving both of these qualities is very difficult and raises several technical challenges that need to be addressed. In this paper, we first expose the reader to several situations and applications that need translation into natural language, thereby, motivating the problem. We then outline, by example, the research problems that need to be solved, separately for data translations and query translations.Comment: CIDR 200

arXiv.org e-Print Archive

CiteSeerX

DSpace at NTUA

Requirement-driven creation and deployment of multidimensional and ETL designs

Author: Abelló Gamazo Alberto
Jovanovic Petar
Romero Moral Óscar
Simitsis Alkis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present our tool for assisting designers in the error-prone and time-consuming tasks carried out at the early stages of a data warehousing project. Our tool semi-automatically produces multidimensional (MD) and ETL conceptual designs from a given set of business requirements (like SLAs) and data source descriptions. Subsequently, our tool translates both the MD and ETL conceptual designs produced into physical designs, so they can be further deployed on a DBMS and an ETL engine. In this paper, we describe the system architecture and present our demonstration proposal by means of an example.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Adversarial Learning in Real-World Fraud Detection: Challenges and Perspectives

Author: Bontempi Gianluca
Caelen Olivier
Lunghi Danele
Simitsis Alkis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/07/2023
Field of study

Data economy relies on data-driven systems and complex machine learning applications are fueled by them. Unfortunately, however, machine learning models are exposed to fraudulent activities and adversarial attacks, which threaten their security and trustworthiness. In the last decade or so, the research interest on adversarial machine learning has grown significantly, revealing how learning applications could be severely impacted by effective attacks. Although early results of adversarial machine learning indicate the huge potential of the approach to specific domains such as image processing, still there is a gap in both the research literature and practice regarding how to generalize adversarial techniques in other domains and applications. Fraud detection is a critical defense mechanism for data economy, as it is for other applications as well, which poses several challenges for machine learning. In this work, we describe how attacks against fraud detection systems differ from other applications of adversarial machine learning, and propose a number of interesting directions to bridge this gap

arXiv.org e-Print Archive

GEM: requirement-driven generation of ETL and multidimensional conceptual designs

Author: Abelló Gamazo Alberto
Romero Moral Óscar
Simitsis Alkis
Publication venue
Publication date: 01/01/2010
Field of study

Technical ReportAt the early stages of a data warehouse design project, the main objective is to collect the business requirements and needs, and translate them into an appropriate conceptual, multidimensional design. Typically, this task is performed manually, through a series of interviews involving two different parties: the business analysts and technical designers. Producing an appropriate conceptual design is an errorprone task that undergoes several rounds of reconciliation and redesigning, until the business needs are satisfied. It is of great importance for the business of an enterprise to facilitate and automate such a process. The goal of our research is to provide designers with a semi-automatic means for producing conceptual multidimensional designs and also, conceptual representation of the extract-transform-load (ETL)processes that orchestrate the data flow from the operational sources to the data warehouse constructs. In particular, we describe a method that combines information about the data sources along with the business requirements, for validating and completing –if necessary– these requirements, producing a multidimensional design, and identifying the ETL operations needed. We present our method in terms of the TPC-DS benchmark and show its applicability and usefulness.Preprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Multi-core column-store parallelization under concurrent workload

Author: Gawade M.M. (Mrunal)
Kersten M.L. (Martin)
Simitsis A. (Alkis)
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/06/2016
Field of study

CWI's Institutional Repository

Synthesizing structured text from logical database subsets. EDBT

Author: Alkis Simitsis
Georgia Koutrika
Yannis Alexandrakis
Yannis Ioannidis
Publication venue
Publication date: 01/01/2008
Field of study

ABSTRACT In the classical database world, information access has been based on a paradigm that involves structured, schema-aware, queries and tabular answers. In the current environment, however, where information prevails in most activities of society, serving people, applications, and devices in dramatically increasing numbers, this paradigm has proved to be very limited. On the query side, much work has been done on moving towards keyword queries over structured data. In our previous work, we have touched the other side as well, and have proposed a paradigm that generates entire databases in response to keyword queries. In this paper, we continue in the same direction and propose synthesizing textual answers in response to queries of any kind over structured data. In particular, we study the transformation of a dynamically-generated logical database subset into a narrative through a customizable, extensible, and templatebased process. In doing so, we exploit the structured nature of database schemas and describe three generic translation modules for different formations in the schema, called unary, split, and join modules. We have implemented the proposed translation procedure into our own database front end and have performed several experiments evaluating the textual answers generated as several features and parameters of the system are varied. We have also conducted a set of experiments measuring the effectiveness of such answers on users. The overall results are very encouraging and indicate the promise that our approach has for several applications

CiteSeerX

Quarry : digging up the gems of your data treasury

Author: Abelló Gamazo Alberto
Candón Arenas Héctor
Jovanovic Petar
Nadal Francesch Sergi
Romero Moral Óscar
Simitsis Alkis
Publication venue
Publication date: 01/01/2015
Field of study

The design lifecycle of a data warehousing (DW) system is primarily led by requirements of its end-users and the complexity of underlying data sources. The process of designing a multidimensional (MD) schema and back-end extracttransform-load (ETL) processes, is a long-term and mostly manual task. As enterprises shift to more real-time and ’on-the-fly’ decision making, business intelligence (BI) systems require automated means for efficiently adapting a physical DW design to frequent changes of business needs. To address this problem, we present Quarry, an end-to-end system for assisting users of various technical skills in managing the incremental design and deployment of MD schemata and ETL processes. Quarry automates the physical design of a DW system from high-level information requirements. Moreover, Quarry provides tools for efficiently accommodating MD schema and ETL process designs to new or changed information needs of its end-users. Finally, Quarry facilitates the deployment of the generated DW design over an extensible list of execution engines. On-site, we will use a variety of examples to show how Quarry facilitates the complexity of the DW design lifecycle.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Interaction Mining: Making Business Sense of Customers Conversations through Semantic and Pragmatic Analysis

Author: Abelló Gamazo Alberto
Aramburu María José
Berlanga Rafael
Nebot Victoria
Pedersen Torben
Romero Moral Óscar
Simitsis Alkis
Publication venue: 'IGI Global'
Publication date: 01/01/2011
Field of study

Via the Web a wealth of information for business research is ready at our fingertips. Analyzing this – unstructured - information, however, can be very difficult. Analytics has become the business buzzword distinguishing traditional competitors from ‘analytics competitors’ who have dramatically boosted their revenues. The latter competitors distinguish themselves through “expert use of statistics and modeling to improve a wide variety of functions” (Davenport, 2006, p. 105). However, not all information lends itself to statistics and models. Actually, most information on the Web is made for, and by, people communicating through ‘rich’ language. This richness of our language is typically missed or not adequately accounted for in (statistical) analytics (e.g. Text-mining) - and so is its real meaning - because it is hidden in semantics rather than form (e.g. syntax). In our efforts of turning unstructured data into structured data, important information – and our ability to distinguish ourselves from competitors - gets lost

Archivio Ricerca Ca'Foscari

UPCommons. Portal del coneixement obert de la UPC

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia